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RELATED APPLICATIONS 

The present application claims the benefit U.S. 
provisional patent application serial number 60/160,334 
filed October 19, 1999, which is incorporated herein by 
reference . 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The invention relates to the field of conversion of 
data from one format to another within a digital data 
processing device or devices. 

2. Background of the invention 

XML ("Extensible Markup Language") is a proposed 
standard for exchanging semi- structured data. It can be 
used as an alternative to HTML. . More information about 
XML can be found in "Extensible Markup Language (XML) 1.0: 
W3C Recommendation 10-February 1998" 
http://www.w3.org/TR/REC-xml and in E.H. Harold, XML 
Extensible Mark up Language . (IDG Books 1998) . 

It is expected that the consumption of XML documents 
will continue to grow. Business entities increasingly 
exchange XML documents as part of their business logic flow. 
Several technical and business organizations have published 
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XML schemes for key document types in specific domains . Web 
sites such as www.xml.org have been established to 
coordinate such activities and maintain XML schemes. 
Existing and new applications also increasingly use XML as 
their input and output format. Major software vendors, such 
as those of browsers and relational and object databases, 
have either provided or announced support for the XML 
format, while many applications are being enhanced with XML 
capabilities. If data must be taken manually from other 
data sources, and integrated^ into XML docvunents, then data 
exchange is slowed. 

In A. Deutsch: et al, xv Storing Semis tructured Data with 
STORED", SIGMOD '99, International Conf. Management of Data, 
Philadelphia PA (ACM 199S-) , pp. 431-442 a language is 
proposed for mapping data from relational databases to XML. 
This technique has the disadvantage that, since it uses 
relational query constructs directly in the mapping 
language, it can apply only to- relational databases. 

SUMMARY OF THE INVENTION 

It is an object of the invention to create a mapping 
suitable for mapping from several types of data sources to 
XML. 

Y0999-429 -3- 



This object is achieved by use a mapping that 
establishes a correspondence between entities in a data 
source on the one hand and lists and scalars on the other 
hand. The language maps the lists and scalars to XML 
5 elements and attributes. For the purpose of this application 

a scalar is a single value and a list is- a list of values. 

Preferably the mapping involves a mapping language 
having two types of statements, value specifications and 
binding specifications . 
10 Preferably also the mapping language is insertable 

directly in a DTD for a target XML document. 

Other objects and - advantages shall be apparent from the 
following. 

15 BRIEF DESCRIPTION OF THE DRAWING 

The invention will now be described by way of 
non-limiting example with reference- to the following 
figures : 

20 1 shows a digital data processing system on which 

the invention can be implemented. 

Fig. 2 shows an overview of the function of the 
invention in context. 
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Fig. 3a shows an example purchase order relational 
schema. 

Fig. 3b shows an example DTD 

Fig. 4 shows conceptually a mapping between a table and 
5 a displayed version of an XML document. 

Fig. 5 shows an example of an annotated DTD ("DTDSA") 
in accordance with the invention. 

Figs. 6a & b show an algorithm for establishing the 
mapping in DTDSA format 
10 Figs. 7a & b show an algorithm for generating XML using 

a DTDSA 

Fig. 8 shows an XML composition data flow 

Fig. 9 shows an XML composition example with input 

x=100 

15 Fig. 10 shows" a retrieved XML document (with input 

x=100) 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
20 Definitions 

In the column "Name given in", if the value is "here", that 
means the concept is used in the present document but not 
defined in the DTD specr. If the value' is DTD, that means 
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the concept is referenced, sometimes appears in a production 
rule, but not explicitly defined in the DTD spec. 



Note: Some of the terms are defined in a recursive manner. 



Term 


short 
hand 


Definition 


Name 

given 

in 


repetition 
symbol 




T ? r , '* f , or l f l 


here 


element 
name 


ENAME 


Name used in a element type 
declaration 


XML 
Spec 


choice list 


CLIST 


a list of cp enclosed by 
T ( f and 1 ) 1 , and separated 

by T, 

- i . e. " ( cp- 1- cp | ... | cp 
)-". 


XML 
Spec 


terminal 
cnoice list 


TCLIST 


a list of "# PCDATA" and 

erames, eacn appearing only 
once, enclosed by r ( T and 
f ) T / and separated by ' | ' , 
i.e. "( # PCDATA | ENAME | 
ENAME ... | ENAME ) " 


here 


sequence 




. a. list of cp enclosed by 
* ( f and 1 ) 1 and separated 

T I 

f r 

i.e, " ( cp, cp~ , . . . , cp) " 


XML 
Spec 


content 
unit 




ENAME, CLIST, SKQ, or 

m/"*~n — r C* TTT 

TCLIST 


here 


particle 


TP 


d ^vjll u ell L- UilXU IvJlXUWcU 

optionally by a repetition 
symbol / , 

i.e.- (Name [ choice | seq) 
( 1 ? T | T * f | ) ? 


VMT, 

Spec 


content 
spec 


conten 
t-specr 


the part that matches 
^ content specr 1 ' iir the DTD 
production rules. That is, 
the part that follows ENAME 
and proceeds r > T in a DTD 
. element type declaration. 


XML 
Spec 


children 

content 

spec 


. childr 
- en 


_ a content .spec that is a 
choice or sequence content 
unit followed optionally 
by~ a- repetition symbol 


XML 
Spec 


PCDATA 


PCDATA 


#PCDATA 


XML 
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declaration 






Spec 


attribute 
aeiinition 


ATD 


The part that includes an 
itiJNlAMci, a type x e.g., CDATA, 
ID, IDREF etc., and a 
default declaration. 


XML 
Spec 


value 

aeciaration 




PCDATA declaration or 
attrioute clerinition . 


here 


Element 
type 

aeciarauion 


ED 


The part that includes a 
- ! ELEMENT" followed by an 
hiviJr&Qjii , con tent- spec, ana a 


here 


Attribute 
list 

declaration 


AD 


The part that includes a 
xv <tATTLIST~, followed by an 
ENAME, a list of ATDs, and 
a > 


here 


DTD 

declaration 




element type declaration 
and attribute list 
. declaration- 


here 


DTD 

construct 




a- DTD declaration,- a 
( sub— exp res- si on of a-) 
content spec, oar a 
" (sufcr-expression of an) 
" attribute-list -declaration 


here 



Fig, 1 shows a digital data processing system on which 
the invention can be implemented. The system will typically 
include a CPU 104, a memory device 106, a display device 
101, data entry devices- such~ as keyboard 102 and mouse 103, 
and a network connection 105. The CPU might be any kind of 
processor such as a: PC, any other- general purpose processor, 
parallel processing device, or distributed processing 
system. The memory device might be of any sort, such as a 
hard drive, a floppy drive, a zip drive, a CD-ROM drive, or 
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several such devices. Other devices for communication with 
a user might also be attached. 

The network connection will commonly be an Internet 
connection, but it might also be an Intranet or other local 
network connection, such as a LAN. Both a local and an 
external network connection might be present. 

The memory device 106 will commonly house data and 
software. The data will include- data that a user may seek 
to communicate with the outside world or data to be used 
internally. The software might be of various sorts, 
including software for implementing the invention. However, 
the invention might also be implemented in hardware . 

While the system shown has a local memory device 106, 
memory accessible via the network connection 105 might be 
used instead of a local memory device. Similarly, the CPU 
104 might be remote front the display 101 and data entry 
devices 102 and 103. 

A user might s-eekr to communicate data to the external 
world under many different circumstances.- 

For instance, suppos-e a user tracks an inventory of 
supplies in a relational database within memory device 10 6. 
The database program will signal to the user when some 
inventory item, such as pencils, becomes low. The user may 
then wish to order the low inventory item via the Internet. 
Y0999-429 _«_ 



The order will typically be expected to be conveyed to the 
supplier in a standard format, such as an XML purchase order 
form. The user might fill out the XML purchase order form 
manually, but this could become burdensome if frequent 
orders are to be undertaken. It would be desirable for the 
CPU 104 to convert the low inventory information from the 
relational database directly onto the standard XML purchase 
order form. When the inventory items arrive, it would also 
be desirable for the CPU 104 to convert a standard XML 
invoice form into relational database information to be 
stored in the memory device 106. 

Another situation where conversion of data might be 
desirable would arise in compiling web pages. A 
stockbroker, for- example, might maintain a first data base 
with a customer's investment portfolio information, a second 
data base with stock quotes, and a third data base with 
financial analysis information. The stockbroker might want 
to select and consolidate information from all three data 
bases to create customized customer web pages, where 
individual customers could view- investment advice. Again, 
the CPU should automatically convert data' from the data 
bases into an XML document that is displayable as a web 
page . 
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The data to be converted need not be from a relational 
database. It might equally well be object-oriented, 
semi-structured, or other schemas. Using this framework, 
one DTD can correspond to multiple heterogeneous data 
sources, and single data sources may be associated with many 
different DTD's. 

Those of ordinary skill in the art might recognize any 
number of other situations where conversion of data into 
XML would be desirable. 

Fig. 2 shows a conceptual diagram of the role of the 
invention. On the left side of the figure, in known 
fashion, a schema 2 01 is used to control formatting of a 
data set 202, such asr a relational database. On the right 
side of the figure, also in known fashion, a DTD 203 is used 
to control formatting of an XML document 204. In one 
aspect, the invention 205 is designed to use the schema 201, 
the data set 202, and the DTD 203 to create an XML document 
204. Fig. 2 is only an example. The invention is 
designed to allow conversion between any data format and 
XML. 

FIG. 3A illustratively includes four relational tables, 
also known as a relational schema, purchase order ("PO") 
305, company 310, lineitem 315, and product 320. 
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Table 305 has three columns, purchase order 
identification ("POID") , buyer, and seller. The rows of the 
table have numerical index values pointing to values for the 
columns. Thus purchase order number 100 is associated with 
buyer 20 and seller 10. 

Table 310 has three columns: company identification 
("COID"), name, and address ("ADDR") . The rows associate 
numerical values with actual company names and addresses. 
Thus the numerical value 10 is associated with the company 
IBM, having an address in New York, and the numerical value 
20 is associated with the company Citibank, also having an 
address in New York. 

Table 315 has three columns.: POID, product 
identification ("PRODID") , and amount. The rows, 330 and 
335, associate purchase order identification numbers with 
product identification numbers and quantities. In the 
figure, purchase order 100 is associated with two product 
identifications, 35678 and 35694, of which 20k and 100k are 
ordered respectively. 

Table 320 has three columns, PRODID, name, and desc. 
(description) . The rows associate- product identification 
35678 with a "THINKPAD"® and product identification 35694 
with a server. 
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Arrows in Fig. 3a illustrate foreign key relations 
among various fields. For example, the record 325 in BO 
table with POID=100 is related via arrows 340 and 345 to two 
records 330, 335 in the lineitem table 315 with POID=100. 
Similarly records 330 and 335 are associated via arrow 350 
to records 355 and 360. 

Fig. 3b shows a Document Type Definition ("DTD"). 
According to the known art, XML makes use of DTD's to 
specify documents. DTD's are very flexible and can specify 
any number of different documents. Fig. 3b shows only one 
simple example, in which a purchase order is specified. 

Line 301 shows the definition of the variable PO. In a 
tree-like fashion, the definition incorporates child 
definitions, i.e. "id" defined at line 302, "buyer" defined 
at line 303, "seller" defined at line 304, and "lineitem" 
defined at line 307. The asterisk after "lineitem" at 320 
indicates that this feature may be repeated any number of 
times in the purchase order. The definitions of "id" 302, 
"address" 311, "prodname" 303, and- "amount" 309 use the 
# PCDATA command to get data directly from a data storage 
device, e.g. 106. The definitions of "buyer" and "seller" 
have attribute lists at 323 and 324. The definition of 
lineitem, also incorporates child definitions, "prodname" at 
line 308 and "amount" at line 310. 
Y0999-429 -12- 
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Fig- 4 shows conceptually how data is to be mapped from 
a relational database into an XML document. Data, such as 
that referred to in the DTD of Fig. 3 is stored in a 
relational database in the form of tables 401. The tables 
5 have rows and columns, the columns being numbered 1, 2, 3, 

and 4 in the example. The database information is to be 
displayed in the form of fields A, B, C, D in the XML 
document 403. A mapping 402 sends data from the database to 
the document and back. A mapping might specify A <-> 1; B 
10 <-> 4; C <-> 2; and D <-> 3 — OR a mapping might specify 

some other correspondence such as A <-> 3; B <-> 4; C <-> 2; 
D <-> 1. 

In order to achieve such mappings, a mapping language 
is proposed. Preferably this mapping language is stored as 
15 annotations- to the DTD. These annotations can be stored in 

a same computer file with the DTD or in a separate file. If 
the annotations are stored with the DTD, then they can be 
stripped off by a simple program prior to generating an XML 
document . 

20 FIG. 3b is to be annotated based on the relational 

schema in FIG. 3a, and the resulting annotated DTD ("DTDSA") 
is illustrated in FIG. 5. Fig. 5 shows a DTD annotated in 
accordance with the preferred mapping language. The 
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preferred mapping language includes 2 types of constructs: 
the binding specification and the value specification. 

Value specifications 
5 A value specification is only allowed to associate with 

either a value or a choice declaration* 

A value specification is a parameterized formula 
containing variables, which, when a data object is 
substituted for each variable in it, produces a text value. 
10 The value specification has the following format: 

VCD :sf 

where VCD is a value or choice declaration, and sf±s any 
scalar-valued function. 

Every value declaration in a DTDSA must have exactly 

15 one associated value specification. Given a value 

declaration ("VD") with a value specification ("VS") in some 
DTDSA, the semantics of the combination is that in every 
document instance of the DTDSA, the value of every 
occurrence of VD is determined by VS. As noted earlier, VS 

20 may have parameters. 

Every choice declaration in- a DTDSA must also have an 
associated value specification. Given a choice declaration 
CD with a value specification VS irr a DTDSA, the semantics 
of the combination is that in every document instance of the 
Y0999-429 -14- 



DTDSA, the alternative taken in every occurrence of CD is 
determined by VS. 

Suppose CD =(Cl|C2|...|Cn) and CD and VS appear as VS: 

(C1|C2 |...|Cn):VS 

5 There are two possibilities. If the value produced by 

VS is an integer I, with I between 1 and n, the alternative 
appearing in place of CD is Ci. Alternatively, if the value 
produced by VS is a string Cj , which matches one of the 
alternatives CI, C2, Cn, the alternative taken in place 

10 of CD is Cj . If the value produced by VS falls in neither 

category, the alternative taken in place of CD is undefined. 

In actual implementations, a user defined default 
alternative or some error reporting string can be used. 

Consider the following example of a DTDSA with a value 
15 specification: 

DTDSA JOB_DESCRIPTION: 

< ! ELEMENT J0B_. DESCRIPTION (SALES | RESEARCH) : f (x) > 
<! ELEMENT SALES (#PCDATA: "Increase sales volume") > 
<! ELEMENT RESEARCH (#PCDATA: "Develop new technology") > 
20 where f (x) has the definition: 

f "SALES", when x = l 
AX) ~ 1 "RESEARCH", otherwise 



The XML document corresponding to the DTDSA given x=l is: 
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<JOB_DESCRIPTION> 

<SAIiES> Increase sales volume </SALES> 
</JOB DESCRIPTION 



5 Binding specification 

A binding specification is a variable and expression 
pair. The expression can be a list of data objects, a 
formula that identifies a list of data objects, or a 
parameterized formula containing variables, which, when a 
10 data object is substituted for each variable in it, produces 

a list of data objects. The binding specification has the 
following format : 

DC ::x 2 := vfi :: x 2 := vf 2 ... :: x n := vf n 
where DC is any DTD construct that is not a value or choice 
15 declaration, x f is a variable, and yfi is a binding function 

for f=1,...>m. 

A binding specification serves two purposes. First, 
when immediately following a repetition symbol, it 
determines the number of times the DTD construct qualified 
20 by the repetition symbol repeats in the document instances. 

Second, it supplies values to the parameters appearing in 
other value or binding functions. The binding function of 
this binding specification may itself contain parameters 
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which obtain values from other binding specifications . This 
feature enables a set of mapping constructs to relate to one 
another and makes DTDSA flexible enough to represent a large 
and diverse set of XML documents. 
5 There are parameters in the mapping constructs of a 

DTDSA that do not always obtain their values from other 
binding specifications. These parameters are called the 
input parameters of the DTDSA, and are used to identify 
specific documents among the set of document instances . 

10 

Binding- variables and function parameters 

To understand how binding specifications supply values 
to function parameters, it is necessary to introduce the 
concepts of ancestral relationships and contexts of DTD 
15 constructs . 

Intuitively, if one envisions every DTD construct in a 
DTD as a node, and every containment relationship and name 
reference relationship as an edge, the DTD will form a 
graph. In most cases, this graph is a directed acyclic 
20 graph (DAG) , with the root element type declaration of the 

DTD being the root of the DAG. The edges of the DAG can be 
considered as denoting an ancestral relationship. For 
example, in the following DTD 
<! ELEMENT A (B, C)> 
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< ! ELEMENT B . . . > 

< ! ELEMENT C . . .> 
element type definition A can be considered an ancestor (or 
parent) of a sequence that is a parent of ename B and ename 
C. Then ename B and ename C can be considered ancestors of 
element type definition B and element type definition C, 
respectively. The ancestral relationship among DTD 
constructs can be formally defined based on the parent 
relationship defined as follows : 

1. An element type declaration is the parent construct of 
its content specification, 

2 . An attribute list declaration is the parent construct 
of each of its attribute type declarations. 

3. For every DTD construct C x that is a sub-expression of 
a content specification, the smallest super-expression 
of Ci that is a DTD construct is its parent. 

4. An element name that appears in a content specification 
is a parent construct of the element type declaration 
with the same element name. The element type 
declaration is considered the parent construct of any 
attribute list declaration- with the same element name. 

The transitive closure of the parent relationship is the 
ancestral relationship. The reverse relationship of the 
ancestral relationship is the descendant relationship. 
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The ancestral relationships so defined do not always 
forma DAG. For example, the ancestral relationship in the 
following DTD contains a cycle : 

<! ELEMENT A (#PCDATA|A) *> 
5 When cycles occur in the ancestral relationship, every DTD 

construct in the cycle becomes the ancestor of every other 
DTD construct in the cycle. However, the concept of 
traversing the relationship graph in either the ancestor or 
descendant direction remains useful. 
10 Some DTD constructs may have more than one parent. As 

a result, a single DTD construct may represent XML fragments 
in different contexts in the document instances. For 
example, in the following DTD 

<! ELEMENT A (B, C)> 
15 <! ELEMENT B (D)> 

<! ELEMENT C (D)> 

<! ELEMENT D (#PCDATA)> 
the element type definition D has two parents, one being the 
element name D in <! ELEMENT B {D)>, the other that in 
20 <! ELEMENT C (D)>. A document conforming to the above DTD 

follows : 

<A> 

<BXD>first</DX/B> 
<CXD>second</DX/C> 
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</A> 

In the document, the element type definition of D 
corresponds to two elements with the same tag D, but in 
different contexts, one being the child of element B, the 
other that of C. 

To discuss the different roles played by the same DTD 
construct, the context of a DTD construct C will be defined 
to be a unique path from the root construct of a DTD to C in 
the descendent direction. Where there is a loop in the DTD, 
there can be an infinite number of contexts for some 
elements . 

Using these concepts, the relationship between binding 
variables and function parameters is defined as follows : 
Given a value or binding function associated with a DTD 
construct in a certain context, a parameter x of the 
function gets its value form the value bound to the binding 
variable with the same name, if any, in the binding 
specification closest to it in context. If such a binding 
variable does not exist, x is an input parameter of the 
DTDSA. 

The DTDSA for the previous DTD will be: 

1: <! ELEMENT A (B, C) ::x:=il ::y:=i2> 
2: < ! ELEMENT B (D) : :y:=x+10> 
3: < ! ELEMENT C (D) ::x:=x+20> 
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4: <! ELEMENT D (# PCDATA :x+y)> 
The virtual XML document represented by this DTDSA with 
input parameters of il=l and i2=2 can be derived as follows. 
Initially at line 1, x and y have the values of 1 and 2, 
5 respectively. y is redefined to 11 at line 2, while x is 

redefined to 21 at line 3. the # PCDATA at line 4 has two 
contexts. In the context of A-B-D, x gets the value of 1, 
and y gets the value of 11, and the value of # PCDATA is thus 
12. In the context of A-C-D, x is redefined to 21 while y 
10 remains at 2, and the value of- #PCDATA is 23. The whole 

corresponding XML document is thus 

<A> 

<BXD>12</DX/B> 
<CXD>23</DX/C> 
15 </A> 

Determining the number of repetitions 

20 Let DC denote a DTD construct , x a variable, and vf a 

list-valued function producing a list of k values {vl, v2, 
... vk} . The DTD construct with an associated binding 

specification (DC) * : :x:=vf , can be considered as equivalent 
to the sequence DC: :x:=vl, DC : :x:=v2, DC : :x:=vk) . 
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Formally , given a DTD construct with an associated 
binding specification (DC) #: :x:=cf , where # is some 
repetition symbol, the DTD construct is considered 
equivalent to one of the following, depending on which 
5 repetition symbol # is: 

1. For #=*: 

If k>l, (DC)* is equivalent to k consecutive copies of 
DC, That is, (DC)* : : x : =vf equiv (DC ;;x:vl, DC : :x:=v2, 
DC : :x:=vk) . IF k=0 (i.e. vf evaluates to an empty 

10 list), (DC)* is equivalent to an empty string. 

2 For #=+: 

If k>l, (DC) + is equivalent to- k consecutive copies of DC 
— that is, (DC)-!- ::x:vf equiv (DC ::x:=vl, DC ::x:=v2, 
...,DC: :x:=vk) . If K=0, (DC)+ is equivalent to one copy 
15 of DC with x given an undefined value, i.e., (DC) + 

: :x:=vf=(DC : :x:=unde fined) . 
3. For #=?: 

if k>l, (DC) ? is an equivalent to one copy of DC, and all 
except the first value produced by vf are ignored. That 
20 is, (DC)? ::x:=vf equiv (DC ::x:=vl). If k= 0, (DC)? is 

equivalent to an empty string. 
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In general, when DC repeats more than once, as required 
by one of the rules, each copy of DC sees a different 
binding of x. When DC is constrained to appear one (or 
zero) times, but vf produces a list of more than one value, 
5 only the first one (or zero) value is used, and all other 

values are ignored. On the other hand, in the case where DC 
is required to appear at least once, but vf produces 0 
values, the value of binding variable x is undefined. In 
actual implementations, a user or system defined default 
10 value can be supplied to x. 

In these discussions, the symbol denotes neither 

equality nor simple assignment. Rather it binds the list of 
values produced by the binding function one after another to 
the binding variable. The number of values in the list 
15 produced by the binding function, together with the above 

rules, determines the number of times the DTD construct 
preceding the repetition symbol repeats XML document 
instances . 

Consider the following DTDSA: 
20 <! ELEMENT A (B, C) : :x:=il ::y:=i2> 

<! ELEMENT B (# PCDATA :y)> 
<! ELEMENT C (D) * : : z :=intseq (x) > 
<! ELEMENT D (# PCDATA) :z)> 
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where the function intseq(x) produces a sequence of integers 
from 1 up to x. The virtual XML document corresponding to 
the DTDSA with il=3 and i2=5 is 
<A> 

<B> 5 </B> 

<C> <D>K/D> <D>2</D> <D>3</D> </C> 
</A> 

Some general comments 

The preferred mapping language has the advantage that 
it can be used to map a wide variety of underlying scheme 
types, not just relational databases- However, in the 
following, an illustrative example will be presented which 
does use a relational database. 

The preferred mapping language also allows data from 
multiple sources to be mapped into one single XML document. 
These multiple sources can be different data containers from 
different types of data systems . However, in the 
illustrative example that follows, the sources are various 
tables from a relational database. 
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The mapping language has the advantage that it only has 
to be done once per DTD (or per XML schema) . Mapping XML 
documents one by one would be less efficient. 

ESTABLISHING A MAPPING 

The procedures for establishing a mapping between a 
given DTD and the underlying data sources include performing 
the following three parts, where their exact order of 
execution is immaterial: 
Given a DTD, 

l.For each DTD construct that ends with a repetition symbol: 

a. Identify a list of data objects, a formula that will 
identify a list of data objects, or a parameterized 
formula that will identify a list of data objects when 
values to the parameters are supplied. For 
convenience, this will be called the binding formula. 

b. Associate the previous list or formula with a variable 
name. For convenience, this variable name will be 
called a binding variable, and the binding variable and 
binding formula pair a binding specification. After 
this step, the binding variable can be used in the 
formula in step la for other DTD constructs. 

c. Associate binding specification to the DTD construct. 
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d. Optionally repeat this process. 



2. For each DTD construct that does not end with a 

repetition symbol and is not a # PCDATA, a choice list or 
5 an attribute definition, optionally associate binding 

specifications with it (i.e. perform the Steps la, lb, 
lc, and Id) . 

m 3. For each DTD subexpression that is a # PCDATA, a choice 

list or an attribute definition: 
^ 10 a. Choose a value, a formula that produces a piece of 

2: text, or a parameterized formula (function) that 

will produce a piece of text when the values to the 
r! parameters are supplied. For convenience, this 

H value or formula is called a value specification. 

15 b. Associate the formula with the DTD subexpression. 

Note: Whenever a parameterized formula is used, in either a 
value specification or a binding specification, each 
parameter in the formula can either be a binding variable 
20 used in a higher level binding specification or otherwise. 

Whenever a binding variable is chosen in a binding 
specification, the variable can be a parameter used in a 
lower level specifications or otherwise These choices will 
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affect the contents of XML document extraction, as will be 
seen later in the XML extraction section. 

Alternatively, the above three steps may be performed in any 
5 other orders, such as Step 3, Step 1, Step 2, or Step 3, 

Step 2 or Step 1, etc. 

Mapping variation 

If the set of underlying data includes XML text segments, 
10 the steps in the described method are modified as follows: 

T.For each DTD construct with an ending repetition symbol: 

a. Associate binding specification or an XML valued value 

specification with the DTD construct. 

b. Optionally associate more binding specifications to the 
15 DTD construct. 

2\For each DTD construct that does not end with a repetition 
symbol and is not a # PCDATA or an attribute definition, 
a. Optionally associate binding specification or an XML 
valued value specification with the DTD construct. 
20 b. Optionally associate more binding specifications to the 

DTD construct. 
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3' -For each DTD construct that is a # PCDATA or an attribute 
definition, associate a non XML-valued value 
specification with the DTD construct. 

4'. For each choice list DTD construct, associate an 

XML-valued or non-XML valued value specification with the 
construct . 

FIG. 6A shows an overview of a method for establishing 
mapping in DTDSA format according to the present invention. 
At 605', a DTD instance 610' is received, and generates a 
DTDSA 615 ' based on some user options 612 ' . 

Fig. 6b shows an internal flow diagram of block 605' . 
Initially the DTD 610' is parsed into some internal format, 
e.g., a directed acyclic graph, which is easy to manipulate, 
as shown in block 62 0' . Several traversals are performed to 
15 annotate DTD constructs, which are represented as nodes in 

the graph, using value or binding specifications, as shown 
in block 630'. The order of operations shown in fig. 630' is 
optional. Any order may be chosen. All the three listed 
items may include acceptance of user options or inputs for 
20 variable names, formula/function selections. 
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In item 1. of block 630', every # PCDATA, choice list, 
or attribute definition is visited. These are annotated 
first with a value specification by choosing variable names 
and formula/ function. In item 2., nodes representing DTD 

5 constructs that end with a repetition symbol ("*", "?") 

are annotated next with binding specifications, by choosing 
binding variables, parameter variables, and 
formula/ function. For all the other nodes, the binding 
specification annotation is optional, as shown in item 3. 

0 During item 2 and item 3, the binding variables stay related 

to certain parameters based on user options or inputs, as 
shown in item 4 . 

Formatting the DTD graph with annotations, as shown in 
block 64 0', is the next stage for preparation of the result 

5 DTDSA. A simple recursive technique can traverse the DTD 

graph to identify the constructs visited, and at the same 
time, in item 1. insert the graph in the original DTD text 
format. In accordance with item 2., during the traversal, 
any annotations associated with a node are found, the value 

0 or binding specifications are printed, immediately following 
the text of the DTD construct that corresponds to the node. 
Value specifications are inserted with a prefix and 
binding specifications are inserted with a prefix 
Also, for any binding specification, the binding variable is 
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inserted first, followed by and then the binding 

formula or function. 

******** 

Generating XML documents based on the mapping 

Now the description will proceed from a discussion of 
the nature of the mapping and how it is generated to a 
discussion of how the mapping is used to generate XML 
documents . 

When a mapping between DTD and the data sources has 
been established, using the DTDSA technique described above, 
XML documents can be created by 

(1) using the DTD as a template for building the XML 
documents and 

(2) supplying values to the parameters in the various 
specifications we associated with DTD constructs and 
then using these specifications as the construction 
instructions . 

In the preferred embodiment, values are assigned to the 
parameters used in the various specifications which are not 
also binding variables. Then starting from the root DTD 
element, each DTD element is recursively instantiated into 
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at least one XML element. In the instantiation process, 
value specifications are used 

(1) to determine the value to be assigned to each # PCDATA 
or attribute definition, and 
5 (2) to determine which child to instantiate in a choice 

list . 

The binding specifications are used 

(1) to determine how many instances a child DTD construct 
should be instantiated into when the DTD construct ends 

10 with a repetition symbol , and 

(2) to associate values with parameters in the 
specifications useful in instantiating descendant DTD 
constructs . 

15 More specifically, rules are recursively defined for 

instantiating individual DTD constructs when all the needed 
parameters (for their corresponding binding or value 
specifications) are known. The method for generating XML 
documents based on the mapping is simply as follows: 
20 1. Read the DTD, the mapping, and input values. 

2. Prepare input values for the parameters defined in the 
last (tail) binding specification of the DTD root 
element ED, and make a set of variable/value pairs 
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called the "environment", Env 0 . For every DTD element 
there will be a different environment Env ± . 

3. Starting from the root element , 

a. Using the incoming Env±, instantiate every DTD 
5 construct (including the root ED) in some tree 

traversal, for example a breadth first search (BFS) 
traversal order. In other words, use a 
first-in-first-out queue to collect all the 
subexpressions that need to be instantiated after 
10 applying the instantiating rules to the current DTD 

construct . 

b. For an ED or AD with nested binding specifications 
(with a potential inner most value specification) , 
resolving the binding specifications from the tail 

15 (outer most) working towards the head (inner most) , 

please see the section on the tail absorbing rule, 
below. This step resolves and moves all but the head 
binding specification into Env ± . 

c. Eixv may be modified during steps 3.b and 3.c, pass 
20 along the new Env to all children of the current DTD 

construct. 

4. Follow step 3 to obtain a result XML document. 
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Definitions used in the creation of XML documents 

A. An environment Env is a set of bindings, e.g., {x=2, y=2, 

z-5} . 

5 B. Suppose bspec is a binding specification. Bnd(bspec, n, 
env) denotes the nth binding specified by bspec, under the 
environment Env. 

C. Eval(E, Env) denotes the function that evaluates the 
10 algebraic expression E using bindings in the environment 

Env. If a parameter in E is not in Env, Eval () will prompt 
user to input a value. For example, Eval(x+1, {x=2}) = 2, 
EvairXM"+x, {x= w L''})= w XML'', and Eval (y+3, {z=l}), in which 
case Eval () will prompt user to input a value for y. 

15 

D. Ival(C, Env) is the function of instantiating any DTD 
construct C using the bindings in Env. 

The tail absorbing rule for resolving nested binding 
20 specifications 

Suppose DTD construct C has nested binding specifications, 
bspec lf bspec 2 , ... , bspec nf with incoming bindings 
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specified in Env B . The rule absorbs all but the first (inner 
most) binding specifications. 

Ival (C+bspec 1 + . . . + bspec n , Env n ) 

= Ival(C+bsped+. . ,+bspeCn-! , Env^ )) 

* . • 

= Ival (C+bsped, Env x ) , 

where Env± - Env i+1 + Bnd(bspec 1+1 , 1, Env ±+1 ), 

for l=n-l, n-2, ... , 1. 

The instantiation rules for individual DTD construct: 
For ease of presenting, the terms in capital letters are DTD 
constructs, and the same terms in small letters denote 
instances of corresponding DTD construct. For example, ED 
denotes the element type definition construct, and ed is an 
instance of ED. 

ED and AD with binding specification bspec: 

Ival (ed+bspec,Env) = Ival (ed,Env+Bnd (bspec, 1 ,Env) ) 

Ival(ad+bspec,Env) = Ival (ad, Env+Bnd (bspec ,1 ,Env) ) 
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ED: assume adi , ad 2 , ... , ad± are ADs associated with this 
ED, i.e., ed, and CS is the content-spec of ed. Also, let 
TagEd be the ENAME of ed. 

Ival(ed f Env) = "<TagEd" + Ival(ad 2 , Env)+Ival(ad 2 , + ... 
5 + Ival(adi , Env) + "/>", if is EMPTY 

or "<TagEd" + Ival(adj , Env)+Ival(ad 2 , Env)+„. +Ival(adi, Env) + 

+ IvalfCS, Env) + "</TagEd>", otherwise 
CP: assume the content particle cp has a single binding 
specification bspec, with k bindings. 
10 *if cp = cu + + bspec, 

Ival(cp, Env) = Ival (cu, Env+Bnd (bspec, 1, Env) ) 
+ Ival(cu, Env+Bnd (bspec, 2 , Env) ) 

+ . .. + Ival(cu, Env+Bnd (bspec, k, Env) ) , If k > 0 

- r if k = 0 

15 •If cp = cu + n +" + bspec, 

Ivalfcp, Env) = Ival(cu, Env+Bnd (bspec, 1, Env ) ) 
+ Ival(cu, Env+Bnd (bspec, 2, Env ) ) 

+ + Ivalfcu, Env+Bnd (bspec, k, Env) ) , If k > 0 

= user provided default value, if k = 0 

20 • If cp = cu + + bspec, 

Ival(cp, Em?) = Ival(cu, Env+Bnd(bspec, l,Env)), ifk > 0 

ifk = 0 
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CU: assume the content unit cu has a single binding 
specification bspec. Let clist denote an instance of choice 
list or terminal choice list constructs. 
5 «If cu = clist + bspec, 

Ivalfcu, Env) - Ival(Alt k , Env) , where Alt k is the Jtth 

alternative of clist. 
•If cu = seq + bspec, 

Ival(cu, Env) = Ival (seq, Env+Bnd (bspec ,1 ,Env) ) 
10 # If cu = ename + bspec, 

Ival(cu, Env) = Ival (ed, Env+Bnd (bspec, 1 , Env) ) , where 

ed is the ED that defines ename. 



SEQ: assume seq is a sequence of k CPs, i.e., seq = (cp x , 
15 cp 2 , . . . , cp k ) . 

Ival (seq, Env) = Ival (cp ± , Env) + Ival (cp 2 , Env) + 

. . . + Ival (cp k , Env) 

PCDATA: assume pcdata (of # PCDATA construct) has a value 
20 specification vspec. 

Ival(pcdata+ vspec, Env) = Eval (vspec, Env) 
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AD: assume the attribute list declaration ad has k ATDs, 
with atd± as its ith attribute definition, and ename± is the 
attribute name of atcfe. 

Ival (ad, Env) = enam.e ± -f + Ival (atd x , Env) + n 

5 + ename 2 + + Ival (at d 2 , Env) -f 

" + . . . + ename* + + 
lval(atd k , Env) 

ATD: assume atd (of ATD construct) has a value specification 
10 vspec. 

Ival(atd y Env) = AlU , Evalfvspec, Em) = I } if atd isan enumerated type, 

= Evalfvspec, Env), otherwise 

Extract variation: 

15 If the mapping is established using the scheme labeled 

Mapping variation, we use a variation of the extract scheme 
to generate XML value from the mapping. The extract 
variation consists of all the above extraction steps , plus 
one addition rule: 

20 

Assign XML- text-block: assume cnstr is a DTD construct with 
a value spec vspec which identifies or produces an XML text 
segment, 
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Ival(cnstr + vspec) = Eval (vspec f Env) 
We can optionally validate whether the value produced by 



Eval(vsepc, Env) conforms with the DTD in question. 

Figs. 7a & b show block diagrams for the XML 
composition algorithm using DTDSA according to the present 
invention. 

In Fig. 7a, a document retrieval and composition 
algorithm 705 receives input parameter name and value pairs 
710, e.g., <A=1, B=100>, and generates a return XML document 
715 based on the provided DTDSA 712. 

An internal flow diagram of the algorithm 705 is shown 
in Fig. 7b. Initially, the algorithm parses the DTDSA 712 
into some internal format, e.g., a directed acyclic graph, 
which is easy to manipulate, and prepares the input 
parameters into environmental variables as depicted in block 
720. The algorithm then performs a breadth first search 
(BFS) traversal on the internal DTDSA structure, using a 
first-in-first-out queue to keep track of the set of 
structure nodes visited. The BFS traversal includes a 
standard procedure which needs to set up initial values (the 
document root and initial environmental variables) for the 
queue at 725, repeats fetching the queue until the queue is 
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empty at 730, and for every node and environmental variables 
fetched at 735, performs suitable operations at 740 to 
generate partial XML components and adds all the children 
nodes and new environmental variables/values to the queue 
5 745. As shown in block 740, the operations for a visited 

node, denoting a data type or attribute type, include 

(1) resolving unbound variables, which are associated 
with the data type or attribute, and defined in binding 
or value specifications in the DTDSA, using the fetched 

10 environmental variables /values; (The resolution of 

unbound variables may involve accessing data sources 
and predefined function calculation.) 

(2) generating partial XML components based on current 
DTDSA node name (ENAME) as the tag, and the resolved 

15 content as the value or attribute; 

(3) adding the newly created variable /value pairs into 
the environmental variables. 

FIG. 8a shows the type of directed acyclic graph 
20 generated in item 712. A data composition flow is shown 

over the DTDSA directed graphical structure. A data type is 
denoted by a circle node as depicted in 835, and terminal 
# PCDATA by oval shaped node 840. The dotted line across 
directed edges denotes choice list for children nodes at 
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825, and an edge marked with a at 82 0 denotes a 

repeatable and optional child data type in the document. The 
initial environmental variables/values 805, ENV 0 , is 
operated on the document root 835, which has an associated 
5 binding specification D:=f(A) 810, a function that depends 

on environmental variable A 8 08 and produces a value for the 
environmental variable D. The resolved D variable /value pair 
as shown in 815 is added into environmental 
variables/values, ENV 2 as shown in 812, and passed along to 

10 child node. Not shown in the figure, the choice list as 

illustrated in 825 should have an associated binding 
specification whose resolution can lead to finding the child 
from the choice list to visit next. The flow will reach the 
leaf nodes, such as # PCDATA node shown in 840, and CDATA 

15 node for attribute definition not shown in the figure. The 

leaf nodes have associated value specifications, e.g., g(D) 
as shown in 830, which can be resolved using the incoming 
environmental variable/value pairs. 

FIG. 9 shows a directed acyclic graph, like Fig. 8, but 

20 specifically related to the example of figures 3a, 3b, and 

5. An illustrative example is shown for a partial 
resolution sequence when an input value 100 is assigned to 
variable x as shown in 800 based on the algorithm as shown 
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in Figs. 7a & b, the DTDSA as shown in FIG. 5, and 
relational schema as shown in FIG. 3a. A sequence of 
resolutions occurs based on the BF3 traversal order. The 
resolutions at numerals 905, 910, 915, 920, 925, 935, and 
945 correspond to the binding/value specs at numerals 505, 
510, 515, 520, 525, 535, and 545 respectively. The 
resolution for the binding spec at numeral 510 using x=100 
involves table BO access with poid=100 and derives into a 
record <100,20,10> for r as shown in 910. The binding spec 
in 505 uses the record r to derive its third argument, 
PO. poid (r) , to 100, which is needed to resolution of w, 
i.e., row(lineitem,poid,100) . Since there are two records in 
table lineitem with POID=100, as shown by numerals 330 and 
335 in FIG. 3a, w is assigned the two records as shown at 
numeral 905. Such binding can be used to derive multiple 
occurrence of a data type along the edge marked with or 
as shown at numeral 902. The two records for variable w 
can be used to derive two XML components lineitem as shown 
at numeral 925. Attribute values with value spec can also be 
similarly derived. For example, as shown at numeral 92 0, the 
attribute name with a "@" prefix of data type buyer can have 
a resolved value s from deriving the binding spec at numeral 
535 using r as shown at numeral 935. 
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Fig. 10 shows the retrieved XML document for the 
example depicted in FIGS . 3a, 3b, 5, and 9. Based on the 

input x=100, the document is a PO with id 100. There are two 
line items retrieved and composed as shown at numerals 1010 
and 1015. Attributes are also illustrated as shown at 
numerals 1005 and 1010. 

From reading the present disclosure, other 
modifications will be apparent to persons skilled in the 
art. Such modifications may involve other features which 
are already known in the design and use of data conversion 
techniques and XML and which may be used instead of or in 
addition to features already described herein. Although 
claims have been f ormulated in this application to 
particular combinations of features, it should be understood 
that the scope of the disclosure of the present application 
also includes any novel feature or novel combination of 
features disclosed herein either explicitly or implicitly or 
any generalization thereof, whether or not it mitigates any 
or all of the same technical problems as does the present 
invention. The applicants hereby give notice that new 
claims may be formulated to such features during the 
prosecution of the present application or any further 
application derived therefrom. 
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The word "comprising", "comprise", or "comprises" as 
used herein should not be viewed as excluding additional 
elements. The singular article "a" or "an" as used herein 
should not be viewed as excluding a plurality of elements. 
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We claim: 

1. A computer method, comprising executing at least the 
following operation in at least one data processing device: 

5 establishing a mapping from lists and scalars corresponding 

to at least one data source into XML elements and 
attributes . 

2. At least one medium readable by a data processing device 
10 and embodying at least one result of the method of claim 1. 

3. A data processing device comprising: 

the at least one medium according to claim 2, and 
at least one processor configured to use the at least 
I 5 one medium to produce an XML document based on the 

mapping . 

4. The method of claim 1, wherein the at least one data 
source comprises at least two data sources, and the data 

20 sources are of different types. 

5. At least one medium readable by a data processor and 

embodying at least one result of the method of claim 4. 
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6. A data processing device comprising: 

the at least one medium according to claim 5; and 
at least one processor configured to use the at least 
one medium to produce an XML document based on the 
mapping . 

7. The method of claim 1, wherein the data source is a 
relational database. 

8. At least one medium readable by a data processing device 

and embodying at least one result of the method of 
claim 7. 

9. A data processing device comprising 

the at least one medium according to claim 8; and 
at least one processor configured to use the at least 
one medium to produce an XML document based on the 
mapping. 

10. The method of claim 1, further comprising executing the 
following operation in the data processing device: 
expressing the mapping in constructs of a mapping language. 
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11. At least one medium readable by a data processing device 
and embodying at least one result of the method of 
claim 10. 

12. A data processing device comprising 
the at least one medium according to claim 11; and 
at least one processor configured to use the at least 
one medium to produce an XML document based on the 
mapping . 

13. The method of claim 10, further comprising executing the 
following operation in the data processing device: inserting 
the constructs into a DTD to create an annotated DTD. 

14. At least one medium readable by a data processing device 
and embodying at least one result of the method of 
claim 13. 

15. A data processing device comprising: 

20 - the at least one medium according to claim 14; and 

at least one processor configured to 

use the at least one medium to produce an XML 
document based on the mapping; and 
perform the inserting operation. 

Y0999-429 -46- 




16. The method of claim 13, wherein the constructs comprise 
at least one of a value specification and a binding 
specifications . 

17. At least one medium readable by a data processing device 

and embodying at least one result of the method of 
claim 16. 

18. A data processing device comprising: 

the at least one medium according to claim 17; and 
at least one processor configured to use the at least 
one medium to produce an XML document based on the 
mapping ♦ 

19. The method of claim 13, wherein 

- at least one of the constructs comprises at least one 
parameter; 

- the at least one of the constructs is adapted so that a 
value of the at least one of the parameters is 
determinable at a time of generation of at least one 
respective XML element associated with the at least one 
of the constructs. 
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20. At least one medium readable by a data processing device 

and embodying at least one result of the method of 
claim 19. 

21. A data processing device comprising: 

the at least one medium according to claim 20; and 
at least one processor configured to 

use the at least one medium to produce an XML 

document based on the mapping; and 

pass the value to the parameter. 

22. The method of claim 1, further comprising executing the 
following operation in the data processing device: 
associating values and or formulas with a DTD. 

23. At least one medium readable by a data processing device 

and embodying at least one result of the method of 
claim 22. 

24. A data processing device comprising: 

the at least one medium according to claim 23; and 
at least one processor configured to 

use the at least one medium to produce an XML 

document based on the mapping; and 
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perform the associating operation. 

25. The method of claim 22, wherein the associating includes 
associating one or more lists of data objects or formulas 
producing data objects with each DTD construct having a 
repetition symbol at the end. 

26. At least one medium readable by a data processing device 

and embodying at least one result of the method of 
claim 25. 

27. A data processing device comprising: 

at least one medium according to claim 2 6; and 
at least one processor configured to 

use the at least one medium to produce an XML 

document ; and 

perform the associating operation. 

28. The method of claim 22, wherein the associating includes 
associating one or more lists of data objects or formulas 
producing data objects with each DTD construct which is not 
a # PCDATA, a choice list, or an attribute list, and does not 
end with a repetition symbol. 
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29. At least one medium, readable by a data processing device 

and embodying at least one result of the method of 
claim 28. 

30. A data processing device comprising: 

the at least one medium according to claim 30; and 
at least one processor configured to 

use the at least one medium to produce an XML 

document based on the mapping; and 

perform the associating operation. 

31. The method of claim 22, wherein associating includes 
associating a value or formula producing a value with each 
PCDATA, choice list, or attribute definition. 

32. At least one medium readable by a data processing device 

and embodying at least one result of the method of 
claim 31* 

33. A data processing device comprising: 

the at least one medium according to claim 32; and 
at least one processor configured to 

use the at least one medium to produce an XML 

do cument ; and 

Y0999-429 -50- 



perform the associating operation. 

34. The method of claim 22, wherein associating includes, 
not necessarily in the following order: 

♦ first associating one or more lists of data objects or 
formulas producing data objects with a DTD construct; 

• second associating at least one of the lists or formulas 
with at least one variable name; and 

* using the variable name as a parameter in at least one 
other formula. 

35. At least one medium readable by a data processing device 

and embodying at least one result of the method of 
claim 34. 

36. A data processing device comprising: 

the at least one medium according to claim 35; and 
at least one processor configured to 

use the at least one medium to produce an XML 

do cume n t ; and 

perform the associating operation and included 
operations . 
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37. The method of claim 1, further comprising executing the 
following operation in the data processing device: 
associating at least one respective environment with a 
respective XML element to be generated, 

38. At least one medium readable by a data processing device 

and embodying at least one result of the method of 
claim 37. 

39. A data processing device comprising: 

the at least one medium according to claim 38; and 
at least one processor configured to 

use the at least one medium to produce an XML 

document; and 

perform the associating operation. 

40. The method of claim 37, wherein the at least one 
environment comprises 

• information from a parent XML element of the respective 

XML element; and 

• information from a binding specification of a DTD 

construct associated with the respective XML element. 



Y0999-429 



-52- 



41. At least one medium readable by a data processing device 

and embodying at least one result of the method of 
claim 40. 

42. A data processing device comprising: 

the at least one medium according to claim 41; and 
at least one processor configured to 

use the at least one medium to produce an XML 

document; and 

perform the associating operation. 

43. The method of claim 37, wherein 

• the mapping includes at least one respective 
specification corresponding to at least one respective 
XML element; 

• the specification comprises at least one parameter for 
receiving a value upon generation of an XML document; and 

• the method further comprises, upon generation of an XML 
document, sending the at least one parameter a value 
according to at least one variable/value pair in the at 
least one respective environment. 
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44. At least one medium readable by a data processing device 
and embodying at least one result of the method of 
claim 43. 

5 45. A data processing device comprising: 

the at least one medium according to claim 44; and 
at least one processor configured to 
- use the at least one medium to produce an XML 
document; and 

1° " perform the associating and sending operations. 

46. At least one medium readable by at least one data 
processing device and embodying software adapted to perform 
operations comprising: establishing a mapping from lists and 

15 scalars corresponding to at least one data source into XML 

elements and attributes. 

47. The at least one medium of claim 46, wherein the at 
least one data source comprises at least two data sources, 

20 and the data sources are of different types. 

48. The at least one medium of claim 46, wherein the data 
source is a relational database. 



Y0999-429 



-54- 



49. The at least one medium of claim 46, further comprising 
executing the following operation in the data processing 
device: expressing the mapping in constructs of a mapping 
language . 

50. The at least one medium of claim 46, further comprising 
executing the following operation in the data processing 
device: inserting the constructs into a DTD to create an 
annotated DTD. 

51. The at least one medium of claim 50, wherein the 
constructs comprise at least one of a value specification 
and a binding specifications. 

52. The at least one medium of claim 50, wherein 

- at least one of the constructs comprises at least one 
parameter; and 

- the at least one of the constructs is adapted so that a 
value of the at least one of the parameters is 
determinable at a time of generation of at least one 
respective XML element associated with the at least one 
of the constructs . 
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53. The at least one medium of claim 46, wherein the 
operations further comprise associating values and or 
formulas with a DTD. 

54. The at least one medium of claim 46, wherein the 
associating includes associating one or more lists of data 
objects or formulas producing data objects with each DTD 
construct having a repetition symbol at the end. 

55. The at least one medium of claim 54, wherein the 
associating includes associating one or more lists of data 
objects or formulas producing data objects with each DTD 
construct which is not a # PCDATA, a choice list, or an 
attribute list, and does not end with a repetition symbol. 

56. The at least one medium of claim 54, wherein associating 
includes associating a value or formula producing a value 
with each PCDATA, choice list, or attribute definition. 

57. The at least one medium of claim 54, wherein associating 
includes, not necessarily in the following order: 

• first associating one or more lists of data objects or 

formulas producing data objects with a DTD construct; 
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• second associating at least one of the lists or formulas 
with at least one variable name; and 

• using the variable name as a parameter in at least one 
other formula. 

5 

58. The at least one medium of claim 46, wherein the 
operations further comprise associating at least one 
respective environment with a respective XML element to be 
generated. 

10 

59. The at least one medium of claim 58 , wherein the at 
least one environment comprises 

• information from a parent XML element of the respective 
XML element; and 

15 • information from a binding specification of a DTD 

construct associated with the respective XML element. 

60. The at least one medium of claim 58 , wherein 

• the mapping includes at least one respective 

20 specification corresponding to at least one respective 

XML element; 

• the specification comprises at least one parameter for 
receiving a value upon generation of an XML document; and 
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• the method further comprises, upon generation of an XML 
document, sending the at least one parameter a value 
according to at least one variable /value pair in the at 
least one respective environment. 

61. At least one data processing device comprising: 

- means for receiving data from at least one data source; 

- at least one processor adapted to perform operations 
comprising: establishing a mapping from lists and scalars 
corresponding to the data into XML elements and 
attributes . 

62. The at least one data processing device of claim 61, 
wherein the at least one data source comprises at least two 
data sources, and the data sources are of different types. 

63. The at least data processing device of claim 62, wherein 
the data source is a relational database. 

64. The at least one data processing device of claim 61, 
further comprising executing the following operation in the 
data processing device: expressing the mapping in constructs 
of a mapping language. 
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65. The at least one data processing device of claim 64, 
further comprising executing the following operation in the 
data processing device: inserting the constructs into a DTD 
to create an annotated DTD. 

66. The at least one data processing device of claim 64, 
wherein the constructs comprise at least one of a value 
specification and a binding specifications. 

67* The at least one data processing device of claim 64, 
wherein 

- at least one of the constructs comprises at least one 
parameter; and 

- the at least one of the constructs is adapted so that a 
value of the at least one of the parameters is 
determinable at a time of generation of at least one 
respective XML element associated with the at least one 
of the constructs. 

68. The at least one data processing device of claim 61, 
wherein the operations further comprise associating values 
and or formulas with a DTD. 
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69. The at least one data processing device of claim 68, 
wherein the associating includes associating one or more 
lists of data objects or formulas producing data objects 
with each DTD construct having a repetition symbol at the 
end. 

70. The at least one data processing device of claim 68, 
wherein the associating includes associating one or more 
lists of data objects or formulas producing data objects 
with each DTD construct which is not a # PCDATA, a choice 
list, or an attribute list, and does not end with a 
repetition symbol. 

71. The at least one data processing device of claim 68, 
wherein the associating includes associating a value or 
formula producing a value with each PCDATA, choice list, c 
attribute definition. 

72. The at least one data processing device of claim 68, 
wherein the associating includes, not necessarily in the 
following order: 

• first associating one or more lists of data objects or 
formulas producing data objects with a DTD construct; 
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• second associating at least one of the lists or formulas 
with at least one variable name; and 

• using the variable name as a parameter in at least one 
other formula. 

73. The at least one data processing device of claim 61, 
wherein the operations further comprise associating at least 
one respective environment with a respective XML element to 
be generated. 

74. The at least one data processing device of claim 73, 
wherein the at least one environment comprises 

• information from a parent XML element of the respective 

XML element; and 

• information from a binding specification of a DTD 

construct associated with the respective XML element. 

75. The at least one data processing device of claim 73, 
wherein 

• the mapping includes at least one respective 

specification corresponding to at least one respective 
XML element; 
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• the specification comprises at least one parameter for 
receiving a value upon generation of an XML document; and 

• the method further comprises, upon generation of an XML 
document, sending the at least one parameter a value 
according to at least one variable /value pair in the at 
least one respective environment. 



10 
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ABSTRACT OF THE DISCLOSURE 

A mapping language, insertable into a DTD, allows 
automatic mapping from data sources into XML. A mapping 
results from the establishment of a correspondence between 
entities in a data source on the one hand and lists and 
scalars on the other hand. The language maps the lists and 
scalars to XML elements and attributes. The mapping 
language includes two constructs: the binding specification 
and the value specification. The value specification 
associates with a value or choice declaration. The binding 
specification includes at least one variable /express ion 
pair. The constructs are insertable into a DTD to create ai 
annotated DTD. 



Y0999-429 



-63- 



1/11 

Ming-Ling Lo et al. 
Y0999-429 WLE 




2/11 




3/11 



PO 



325 



company 



lineitem 

335- 
produc^ 



3* 



3ts 



1 

J 







310 



if 




J'5 
5*> 



Figure 3a Example purchase order relational schema 
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, <!DOCTYPE PO [ f—3Z0 
^<!ELEMENT PO (id, buyer, seller, (iineitem)* )> 
B 0 i ^lELEMENT id (#PCDATA)> 
•??3 ^^tELEMENT buyer {address)> 

V— <|ATTLIST buyer 
' 3^ name CDATA #REQUIRED > 

32y ~ ^<!ELEMENT seller (address}> 

\— <!ATTUST seHer 

name COATA #REQUIRED > 

— 3* ^ <!ELEMENT address (#PCDATA)> 

3>c^-^< IELEMENT Iineitem (prodname, amount)> 



<! ELEMENT prodname (#PCDATA )• 
y <!ELEMENT amount (#?CDATA ): 



Figure %, Example DTD 
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<!DOCTYPE PO [ 

<!£L EMENT PO (id, buyer, seller, h 05 

5 qq/ (iififiitem)* w:= row(]ineitem, potd, PQ $©}d(r)) )> 

r:=row(PO, poid^x) « ■■ gjQ 

<!ELEMENT id {#PCDATA :PO.pokJ(r)}> — 5 - 

<!ELEMENT buyer {addre55)>r.^:=r<w(<x>mpany > id-JJO.buyer(r)) — 5 35 

<!ATTUST buyer ^b2Q 

name CDATA #REQUIRED rcompany.narne(s) > - 
<!ELEMENT seller (address)> :; s.~ row(company s id, PCXsefier(r)} — *~ S 40 
<!ATTL!ST seller 

name CDATA #REQUlftED ."comp3ny,name(s) > 
< {ELEMENT address {#PCDATA :cwnpany.adeVte) )> 

<<ELEMENT iineitem < prodname, 6 45 /~~ b 25 

amount)> :: v:~ row{ prod, prodid, 1ineitem.prodid(w)) 
<*ELEMENT prodname (#PCDATA :produ"ct.prodname{Y) )> 
<!ELEMENT amount (#PCDATA^nek«m.arwount<w} )> 

Figure 5 Example OTDSA 
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DTD 




DTDSA 
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* read in DTD as a graph structure; 



* Scan DTD graph, 

1 for every #PCDATA, choice iist ( or 
attribute type deMon visited 
annotate the node with a value sn.-o by 
choosing v ana fate names and for^ ^function; 

Zfbrevery DTO construct that ends with a 
repetition syrnbof: 

annotate the node with binding specs by 

Choosing ^n#^4^r4abtes ( parameter variables, 

and formula/function; 
**. for every other DTO constructs: 

optionally annotate the node wtfi binding specs by 

choosing binding variables, parameter variables, 

and lormtia/iunction' 
4, make sure all parameters but input parameters^ 

are defined by refevent binding, specs; 



* Format the annotated graprnnfo a DTDSA 
; print the graph in the original oraer 
2 fo r e very annotated node ertcounte red- 
pnnt value specs with a prefix 
print bMng specs with a prefix and print 
between binding variable am* binding formula 
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Algorithm for establishing the mapping in DTDSA format 
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770 



712 



input parameter 
pairs 

(A=1;B-100) DTDSA 

1(2 



~f10 



105 




output 

XMLi 



705 



~7ao 



t 

* readin DTOSA as a-graph structure; 

* accept input parameters and add into ENV; 



7M 



* add document root gfviV into queue tail; 



73 



f 



735 



queue not 



false 



3 



* retrieve a node + ENV from queue head; 



« resolve unbound variables in btnd/vaiue spec 
using incoming ENV, 

* generate partial XML crmponents based on 
DTDSA Ename (as tag) and resolve content 
(as value/attribute) 

* add newly resolved variable/value pairs into 
ENV; 



i 



* add all children + new ENV to queue tail. 
tfte^Gfcifdrefi; 



7*5 



? 15 F<3 7 b 

Algorithm for generating XML using DTDSA 
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?05 ^ *08 

V- ENVo (A="100"; B="Joe M ) 




leavas 



Figure ^ XML composition data flow 
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°>Q0 x=10Q 

«100,35768,20K>, <1 00,35694, 100K» 




lineitem, 

=<20 cflTfBANKNY> 1 . v:=row(prod,prodid\35678> 

' Y x =<THtNKPAD,..>\ 

S 2 v=row(prod,prodid,3d694> 

=<SERVER r > \ v 



920 



com P any.name(s=<20,aT®ANK,NY>) 
^CITIBANK 



company,addr(s=<2ACimANK,NY>) 
=/VY 



Figure 1 XML composition example with input x-100 
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<PO> 

<id> i 00 </id> ^ — 'ft> 05 
<buyer name="CiTIBANK"> 

<address> NY </address> 

</buyer> 

<sei!er name="IBM"> 

<address> NY </address> 
</seller> 
fnin <»neitem> 

/ U 1 U <prodname> THINKPAD </prodname> 

<amount> 20K </amount> 
</lineitem> 
<lineitem> 

I 0 1 5 <prodname> SERVER </prodname> 

<amount> I00K </amount> 
</lineitem> 
</P0> 

Figure lo Retrieved XML document (with input x 
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